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Abstract. We give a Kleene-type operational characterization of Muller 
context-free languages (MCFLs) of well-ordered and scattered words. 



1 Introduction 



A word, called 'arrangement' in [12J . is an isomorphism type of a countable 
labeled linear order. They form a generalization of the classic notions of finite 
fj ■ and w- words. 

Finite automata on w-words have by now a vast literature, see |20j for a com- 
prehensive treatment. Finite automata acting on well-ordered words longer than 
uj have been investigated in [219110122123] . to mention a few references. In the 
last decade, the theory of automata on well-ordered words has been extended 
j^Z to automata on all countable words, including scattered and dense words. In 

\Q ', [31518] . both operational and logical characterizations of the class of languages 

i* ■ of countable words recognized by finite automata were obtained. 

Context-free grammars generating w-words were introduced in [llj and subse- 
quently studied in [7119] . Context-free grammars generating arbitrary countable 
words were defined in [13114) . Actually, two types of grammars were defined, 
context-free grammars with Biichi acceptance condition (BCFG), and context- 
free grammars with Muller acceptance condition (MCFG). These grammars gen- 
Vh ■ erate the Biichi and the Muller context-free languages of countable words, ab- 

breviated as BCFLs and MCFLs. Every BCFL is clearly an MCFL, but there 
exists an MCFL of well-ordered words that is not a BCFL, for example the set 
of all countable well-ordered words over some alphabet. In fact, it was shown in 
[13] that for every BCFL L of well-ordered words there is an integer n such that 
the order type of the underlying linear order of every word in L is bounded by 
u n . 

A Kleene-type characterization of BCFLs of well-ordered and scattered words 
was given in 16 . Here we provide a Kleene-type characterization of MCFLs of 
well-ordered and scattered words. Before presenting the necessary preliminaries 
in detail, we give a formulation of our main result, at least in the well-ordered 
case. 



Suppose that £ is an alphabet, and let £^ denote the set of all (countable) words 
over £. Let P(£$) be the set of all subsets of £K The set of /iajT^-expressions 
over £ is defined by the following grammar: 

T ::=a\e \ x \T + T \T-T \ fix.T \ T u 

Here, each letter a ^ £ denotes the language containing a as its unique word, 
while e denotes the language containing only the empty word. The symbols 
+ and • arc interpreted as set union and concatenation over P(£*), and the 
variables x range over languages in £K The /i-operator corresponds to taking 
least fixed points. Finally, w is interpreted as the w-power operation over P(£$): 
L i-t L ■ L- ■ ■ . An expression is closed if each variable occurs in the scope of a 
least fixed-point operator. Each closed expression denotes a language in P{£$). 
Our main result in the well-ordered case, which is a corollary of Theorem [2] is: 

Theorem 1. A language L C £* is an MCFL of well-ordered words iff it is 
denoted by some closed jMjjT w - expression. 

Example 1. The expression nx.(x u +a + b + e) denotes the set of all well-ordered 
words over the alphabet {a, b}. 

It was shown in |16] that the syntactic fragment of the above expressions, with 
the w-power operation restricted to closed expressions, characterizes the BCFLs 
of well-ordered words. A similar, but more involved result holds for MCFLs of 
scattered words, cf. Theorem [21 Both theorems were conjectured by the authors 
of [H]. 



2 Notation 

2.1 Linear orderings 

A linear ordering is a pair (7, <), where / is a set and < is an irreflexive transitive 
trichotomous relation (i.e. a strict total ordering) on I. If I is finite or countable, 
we say that the ordering is finite or countable as well. In this paper, all orderings 
are assumed to be countable. A good reference for linear orderings is |21j . 

An embedding of the linear ordering (I, <) into (J, -<) is an order preserving 
function / : I — )■ J, i.e. x < y implies f(x) -< f(y) for each x,y £ I. If / 
is surjective, we call it an isomorphism. Two linear orderings are said to be 
isomorphic if there exists an isomorphism between them. Isomorphism between 
linear orderings is an equivalence relation; classes of this equivalence relation are 
called order types. If I C J and < is the restriction of ~< onto /, then we say 
that (J, <) is a sub-ordering of (J, -<). 

Examples of linear orderings are the ordering (N, <) of the positive integers, 
the ordering (N_, <) of the negative integers, the ordering (Z, <) of the integers 



and the ordering (Q, <) of the rationals. The respective order types are denoted 
u>, —uj, £ and r\. In order to ease notation, we write simply I for (J, <) if the 
ordering < is standard or known from the context. 

An ordering is scattered if it does not have a sub-ordering of order type r\, 
otherwise it is quasi-dense. An ordering is a well-ordering if it does not have a 
sub-ordering of order type —uj. Order types of well-orderings are called ordinals. 

When (/, <) is an ordering and for each i £ I, ( Ji, <i) is an ordering, then the 
generalized sum ^2(Ji, <i) is the disjoint union {(i,j) : i £ I,j € Ji} equipped 

with the lexicographic ordering (i,j) < (i',j') iff i < i' , or i — i' and j <i j'. It 
is known that if (I, <) and the (Ji, <i) are scattered or well-ordered, then so is 
the generalized sum. The operation of generalized sum can be extended to order 
types since it preserves isomorphisms. For example, £ = — uj + uj. Ordinals are 
also equipped with an exponentiation operator. 

Hausdorff classified linear orderings into an infinite hierarchy. Following [T7] , we 
present a variant of this hierarchy. Let VDq be the collection of all finite linear 
orderings, and when a is some ordinal, let VD a be the collection of all finite 
sums of linear orderings of the form ^(ii,<i), where for each integer i £ Z, 

(Ii, <i) is a member of VD ai for some ordinal on < a. According to a theorem 
of Hausdorff (see e.g. [3T], Thm. 5.24), a (countable) linear ordering (I,<) is 
scattered if and only if it belongs to VD a for some (countable) ordinal a; the 
least such a is called the rank of (/,<), denoted rank(7, <). 



2.2 Words, tree domains, trees 



An alphabet is a finite nonempty set S of symbols, usually called letters. A word 
over £ is a linear ordering (/, <) equipped with a labeling function X : / — > S. 
An embedding of words is a mapping preserving the order and the labeling; 
a surjective embedding is an isomorphism. Order theoretic properties of the 
underlying linear ordering of a word are transferred to the word. A word is finite 
if its underlying linear order is finite, and an w-word, if its underlying linear order 
is a well-order of order type uj. We usually identify isomorphic words and denote 
by 17" the set of all words over S. As usual, we denote the collection of finite 
and w-words over £ by S* and E u , respectively. The length of a word u G S* 
is denoted \u\. A language over S is a subset of U^. As in the introduction, we 
let P(JC") denote the collection of all languages over E. 

When (I, <) is a linear ordering and Wi = (Ji, <i, Xi) for i £ I are words, then 

we define their concatenation Yii^i w i as the word with underlying linear order 

Yl(Jii<i) an d labeling X(i,j) — Xi(j). When / has two elements, we obtain 

iei 

the usual notion of concatenation, denoted u ■ v, or just uv. The operation of 

concatenation is extended to languages in P(S^): \\ ie j Li — {$\i£i w i '■ w i e 



Li}. When L,Li,L 2 C Z 1 ", then we define L\ + L 2 to be the set union and 
L\L 2 = {™ : u G Li, v G L2}. Moreover, we define U° — Y\ i£lfi L. 

The set P(U$) of languages over 2J, equipped with the inclusion order, is a 
complete lattice. When A is a set, a function / : P(A) n — > P(A) is monotone if 
Ai C A< for each i G [ra] implies /(Al, . . . , An) Q f(A[, . . . , A' n ). The following 
fact is clear. 

Lemma 1. The functions +, • : P(S^) 2 ->■ P(Z») and w : P(Z*) -)■ P(i7") are 
TTionotone. 

We will also consider pairs of words over an alphabet S, equipped with a finite 
concatenation and an w-product operation. For pairs (u,v), (u',v') in Z' x Z', 
we define the product (u, v) ■ (u' , v') to be the pair (to', v'v), and when for each 
i G N, (ui,i>i) is in Z" x Z 1 ", then we let Y[ i u i,Vi) be the word ( Y[ u i) (II v i)- 

Let P{£^ x Z J ) denote the set of all subsets of Z J x Z B . Then P{£^ x Z B ) 
is naturally equipped with the operations of set union L + L' ', concatenation 
L - L' = {(u,u) • (u',v') : (u,v) G L, (u',v') G Z/} and Kleene star L* = 
{e} U I U l? U • • • . We also define an w-power operation P(Z" J x Z J ) -> P(Z' tt ) 
by i" = {IlK^O : («.,"») 6 £}• When Li,i 2 C Z», let L x x L 2 = {(«,w) : 

MGii, u G L 2 } a»x Zt 

Lemma 2. The functions 

x : P(Z' tt ) 2 -> P(Z' tt x Z tt ) 
+, • : P(Z' tt x Z tt ) 2 -» F(Z' tt x Z«) 
* : F(Z' tt x Z tt ) -4- P(Z J x Z J ) 
w : P(Z 8 x Z tt ) ->■ P(Z J ) 

are monotone. 

We will use Lemma Q] and Lemma [5] in the following context. Suppose that for 
each i e [n] = {l,...,n}, /, : P(Z»)"+p -> P(Z") is a function that can be 
constructed by function composition from the above functions, the projection 
functions and constant functions. Let / = (/1, . . . ,/„) : P(£t) n +P -> P(Z'8)™ 
be the target tupling of the /,. Then / is a monotone function, and by Tarski's 
fixed point theorem, for each y G P(Z , ") P there is a least solution of the fixed 
point equation x = f(x,y) in the variable a; ranging over P(£») n , This least 
fixed point, denoted fix.f(x, y), gives rise to a function P(£$) p — > P(Z ti )™ in the 
parameter y. It is known that this function is also monotone, see e.g. [5J. 

A iree domain is a prefix closed nonempty (but possibly infinite) subset of N* . 
Elements of a tree domain T are also called nodes of T. When x and x-i are 
nodes of T for ieff* and i G N, then x-i is a c/izZd of x. A descendant of a node 



a; is a node of the form x-y, where y G N*. Nodes of T having no child are the 
leaves of T. The leaves, equipped with order inherited from the lexicographic 
ordering of N* form the frontier of T, denoted fr(T). An inner node of T is a 
non-leaf node. Subsets of a tree domain T which themselves are tree domains 
are called prefixes of T. A path of a tree domain T is a prefix of T such that each 
node has at most one child. A path can be identified with the unique sequence 
w in N- w of all sequences over N of length at most lo such that the set of nodes 
of the path consists of the finite prefixes of w. A path 7r of T is maximal if no 
path of T contains ir properly. When T is a tree domain and x G T is a node of 
T, then the sub-tree domain T\ x of T is the set {y : xy G T}. A tree domain T 
is locally finite if each node has a descendant which is a leaf. 

A tree over an alphabet A is a mapping t : dom(t) — ► A U {e}, where dom(t) 
is a tree domain, such that inner vertices are mapped to letters in A. Notions 
such as nodes, paths etc. of tree domains are lifted to trees. When it is a path 
of the tree t, then labels(7r) = {t(u) : u G 7r} is the set of labels of the nodes 
of 7T, and infLabels(7r) is the set of labels occurring infinitely often. For a path 
7T, head(7r) denotes the minimal node x of n (with respect to the prefix order) 
with infLabels(7r) = labels (ir\ x ), if tt is infinite; otherwise head(7r) is the last 
node of 7T. The labeled frontier word lfr(i) of a tree t is determined by the leaves 
not labeled by e, which is equipped with the lexicographic ordering of N* and 
the labeling function of t. It is worth observing that when 7r — xo,xi, . . . is an 
infinite path of a tree t and for each i, on (/3j, resp.) is the word determined 
by the leaf labels of the descendants of Xi to the left (right, resp.) of Xj+i (i.e. 
if Xj+i is the jth child of Xi, then on — 1& (i | a>i ) • lfr(£|a>2) • . . . • lfr (i| ^.(j — 1) ) and 
similarly for /3j), then lfr(i) = Y[ (otiifii)- 



2.3 Muller context-free languages of scattered words 

A Muller context-free grammar, or MCFG for short, is a system G = {V, S, R, S, J 7 ), 
where V is the alphabet of nonterminals, S is the alphabet of terminals, EDV = 
0, R is the finite set of productions of the form A — >• a with A G V and 
q£(£U V)*, S G V is the start symbol and T C P(V) is the set of nonempty 
accepting sets. 

A derivation tree of the above grammar G is a tree t : dom(t) — > V U £ U {e} 
satisfying the following conditions: 

1. For each inner node x of t there exists a rule X —¥ X\ . . . X n in R such that 
£(#) = X, the children of a; are exactly x ■ 1, . . . , x • n, and for each i G [n], 
i(x ■ i) — Xi so that when n = 0, x has a single child x • I labeled e; 

2. For each infinite path 7r of i, infLabels(7r) is an accepting set of G. 

A derivation tree is complete if its leaves are all labeled in S U {e}. If t is a 
derivation tree having root symbol t(e) = A, then we say that t is an A-tree. 



The language L(G, A) C E* generated from A 6 V is the set of frontier words 
of complete A-trees. The language L(G) generated by G is L(G, S). An MCFL 
is a language generated by some MCFG. 

Example 2. If G = ({5, 1}, {a, 6}, #, 5, {{/}}), with 

R = {S -t a, S -S- 6, 5 -»■ e, 5 ->■ J, I ->■ 5/}, 
then -L(G) consists of all the well-ordered words over {a, &}. 

Example 3. EG= ({S, I}, {a, b}, R, S, {{/}}), with 

R = {S -> a, 5 -> 6, 5 ->■ e, 5 ->■ J, J -> 5/5}, 
then i(G) consists of all the scattered words over {a, &}. 

Let L C 17" be an MCFL consisting of scattered words only and G = (V, E, R, S, J 7 ) 
an MCFG with L(G) = L. We may assume that G is in normal form P3] - among 
the properties of this normal form we will use the following ones (see [13], Prop. 
14) frequently: 

— For every derivation tree there is a locally finite derivation tree with the 
same root symbol and same labeled frontier. 

— The frontier of each derivation tree is scattered. 

In the rest of the paper, we fix an MCFG G = (V, E, R, S, T) in normal 
form generating only scattered words. 

When t is a derivation tree, then we define rank(t) = rank(fr(t)). For a derivation 
tree t, let maxNodes(t) be the prefix of dom(i) consisting of the nodes having 
maximal rank, i.e. maxNodes(t) = {x € dom(i) : rank(i|a;) = rank(t)}. Suppose 
that t is locally finite. It is known, (see e.g. [15], proof of Proposition 1, paragraph 
4) that in this case maxNodes(t) is the union of finitely many maximal paths. 
Clearly, the set {m, . . . ,7r„} of these paths is unique. Let level(i) stand for the 
above n, the number of maximal paths covering maxNodes(i). Also, let branch(t) 
stand for the longest common prefix of the paths w± , . . . , 7r„ (which is a finite 
word if level(t) > 1 and is 7Ti if level(i) = 1). 

We say that a (not necessarily locally finite) derivation tree t is simple if maxNodes(i) 
contains a single infinite path it and if infLabels(7r) = labels(7r), i.e. head(-7r) = e. 
(When t is additionally locally finite, then this path 7r contains all nodes of 
maxNodes(i).) Such a path is called the central path of t. If t is a simple A-tree 
and F is the set of labels of its central path, then we call t an F -simple A-tree. 

3 The main result 

For locally finite complete derivation trees t' and t, let t' -< t if one of the 
following conditions holds: 
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1. rank(t') < rank(t); 

2. rank(t') = rank(i) and level(t') < level(t); 

3. rank(t') = rank(i), lcvcl(i') = lcvel(i) > 1 and |branch(£')| < |branch(i)|. 

4. rank(t') = rank(t), level(t') = lcvel(t) = 1, that is, the set of nodes of 
maximal rank is a path 7r in t and a path tt' in t' . Then let t' -< t iff 

head(7r')| < |head(7r)|. 

Lemma 3. The relation -< is a well-partial order (wpo) of locally finite com- 
plete derivation trees. The minimal elements of this wpo are the one-node trees 
corresponding to the elements of S U {e}. Suppose that t is a locally finite com- 
plete derivation tree and t' = t\ x is a proper subtree of t, so that x^e.Iftis 
not simple, or if t is simple but x does not belong to the central path of t, then 
t' <t. 

Proof. It is clear that -< is irreflexive. To prove that it is transitive, suppose 
that t" -< t' and t' -< t. If rank(i") < rank(t), then clearly t" -< t. Suppose 
that rank(t") = rank(i). Then also rank(i") = rank(t') = rank(i). If level (*") < 
level (i) then t" -< t again. Thus, we may suppose that level(i") = lcvcl(i), so that 
level(t") = lcvel(t') = level(t) = n. Now there are two cases. If n > 1, then, since 
t" -< t' and t' -< t, we know that |branch(£")| < |branch(i')| < branch(i) and 
thus t" -< t. If n = 1, then the maximal nodes form a single maximal path in each 
of the trees t",t' and t. Let us denote these paths by tt",tt' and tt, respectively. 
As t" -< t' and t' -< t, we have that |head(7r")| < |head(7r')| < |head(7r)|, so that 
t" -< t again. 

The fact that there is no infinite decreasing sequence of locally finite complete 
derivation trees with respect to the relation -< is clear, since every set of ordinals 
is well-ordered. 

Suppose now that t is a locally finite complete derivation tree which has at 
least two nodes. By assumption, t has a leaf node x. Let t' = t\ x . If rank(t') < 
rank(i) then t' -< t. Otherwise, rank(i') = rank(t) = and t is necessarily finite 
(since the frontier of an infinite complete derivation tree is infinite). Clearly, 
maxNodes(i) is the set of all nodes of t, and either level(t') = 1 < level(i), 
or level(i') = level(t) = 1. In the latter case, t has a single maximal path tt, 
and |head(7r')| = < |head(7r)| for the single maximal path n' of t'. In either 
case, t' -<, t. Thus, no locally finite complete derivation tree having more than 
one node is minimal. On the other hand, all one-node complete derivation trees 
corresponding to the elements of SU{e} are clearly minimal (and locally finite). 

To prove the last claim, suppose that t is a locally finite complete derivation 
tree and t' — t\ x . If rank(i') < rank(i), we are done. Otherwise, rank(t') = 
rank(i) and x is a member of maxNodes(i). Thus, if 7r is a maximal path of 
maxNodes(i'), then xir is a maximal path of maxNodes(t). Hence level(t') < 
level(i). If level(t') < lcvel(t), we are done. Otherwise, level(i') = level(i) and 
maxNodes(i) = a;maxNodes(t'). 

Now there are two cases. 



1. If level(i) > 1, then branch(i) = irbranch(£'), thus |branch(i')| < |branch(£)| 
and t' -<t. 

2. Suppose that level(t) = 1, and let it denote the unique maximal path of t 
whose nodes form the set maxNodes(i). Since rank(t') = rank(i), we have 
that x belongs to 7r and, by assumption, t is not simple. Since t is not simple 
and has at least two nodes, head(7r) ^ e and |head(7r')| < |head(7r)|, where 
7r' is the unique maximal path of t' whose nodes form the set maxNodes(i'). 
(Actually n' is determined by the proper suffix n\ x of n.) D 

Now we define certain ordinary w-regular languages |18l20j corresponding to 
central paths of simple derivation trees. Let r stand for the (finite) set consisting 
of those triplets 

(a, B, /3) e (V U S)* x V x (V U S)* 

for which aBfi occurs as the right-hand side of a production of G. For any 
nonterminal A S V and accepting set F e F, let Ra,f Q r u stand for the set of 
w- words over _T accepted by the deterministic (partial) Muller (word) automaton 
(F, r, 6, A, {F}), with B = S(C, (a, D, j3)) if and only if D = B and C -)■ aB/3 
is a production of G. By definition, each Ra,f is an w-regular set which can 
be built from singleton sets corresponding to the elements of r by the usual 
regular operations and the w-power operation (actually, since every state has to 
be visited infinitely many times, Ra,f can be written as the w-power of a regular 
language of finite words over _T) . 

Members of Ra.f correspond to central paths of F-simple A-trees in the following 
sense. Given w = {a\, A\, /3i)(a 2 , A^fc) ■ ■ ■ & Ra,f, wc define an i^-simple A- 
tree t w of G as follows. The nodes xq, X\, . . . of the central path of t w are xq = e, 
and Xi = Xi-i ■ (|c*j| + f), for i > 0. Each Xi has laj+iAj+i/Jj+il children, 
respectively labeled by the letters of the word ai + iA i+ if3 i+ i. Nodes not on the 
central path of t w are leaf nodes. 

It is straightforward to see the following claims: 

1. For each w 6 Ra,f, t w is an F-simple A-tree. 

2. Every F-simple A-tree has a prefix of the form t w , for some w S Ra,f- Thus, 
every such tree can be constructed by choosing an appropriate w e Ra,f, 
and substituting a derivation tree t x with root symbol t w (x) for each leaf x 
oit w . 

Moreover, it is clear that when w — (oti, A\, fli)(a>2, ^2,^2) • • •, then lfr^) is 

(n lG N a 0-(n ie N_ft)- 

Let us assign a variable Xa to each A S V, and let X be the set of all variables. 
For each ordinary regular expression r over r, we define an expression (term) r 
over EUX involving the function symbols x , +, •. To this end, when a is a word 
in (EUV)*, let a be the word in (XUS)* obtained by replacing each occurrence 
of a nonterminal A by the variable Xa- Then, for a letter 7 = (a, A, j3) £ r, 
define 7 = a x j3. To obtain r, we replace each occurrence of a letter 7 in r by 7. 



When A is a nonterminal and A G F for some Ff J, consider an ordinary 
regular expression r^jr over i -1 such that r^ A F denotes the set Ra,f (defined 
above) of all w-words corresponding to central paths of F-simple A-trees. Then 
consider the following system of equations Eq associated with G in the variables 
X: 

x a= E « + E (^) w - 

A-HiGR AeFef 

Example 4- The system of equations Eg associated with the grammar in Exam- 
ple [3] is: 

Xj = (X s x x s r 

As usual, we can associate a function fa '■ P(E^) X — > P{E^) X with Eg- By 
Lemmas Q] and [5] and using the facts that the projections are monotone and that 
monotone functions are closed under function composition, we have that fa is 
monotone. Thus, fa has a least fixed point. 

Proposition 1. For each A G V , the corresponding component of the least fixed 
point solution of the system Eq is the language L{G, A) of all words derivable 
from A. 

Proof. The fact that the languages L{G,A), A G V, form a solution is clear 
from the definition of Eq. Let us also define L(G,a) = {a}, for each a G UL){e}. 
Suppose that the family of languages La, A G V is another solution, and let 
L a = {a} for a G £ U {e}. We want to show that if t is a locally finite complete 
A-tiee with lfr(i) = u, then u G La, for each A G S U {e} U V. We apply 
well-founded induction with respect to the wpo -<. 

For the base case, if t consists of a single node, then A — a G S U {e}, u = a, 
and our claim is clear. Otherwise, there are two cases: either t is a simple tree, 
or not. 

lit = A(ti , . . . , t n ) is not simple, then we have U -< t for each i G [n] by Lemma|3l 
Let Ai be the root symbol of U and Ui the labeled frontier word of ti for each i. 
By the induction hypothesis, each Ui is a member of La { ■ Since t is a derivation 
tree, A -» A\ . . . A n is a production of G. Thus, by the construction of Eq, 
u = u\ . . .u n G L^.. 

Otherwise, if t is an F-simple A-tree for some F € J 7 and A £ V, then £ can be 
constructed from a tree <„ with w G -Ra,f by replacing each leaf node x of t w 
by some complete derivation tree t x with root symbol t w (x). Since such leaves 
are not on the central path of t, we have t x -< t for each x, again by Lemma |3l 
Applying the induction hypothesis, we get that the labeled frontier word u x of 



each t x is a member of L tw t x \. Thus, by the construction of Eq 1 u is a member 
of L A - □ 

It is well-known, cf. |4I1) or [BJ, Chapter 8, Theorem 2.15 and Chapter 6, Section 
8.1, Equation (3.2), that whenL,L',L" are complete lattices and / : LxL'xL" — > 
L and j:LxL xL — > h are monotone functions, then the least solution (in 
the parameter z) of the system of equations 

* = f(%, y, z) 
y = g{x,y,z) 

can be obtained by Gaussian elimination as 

x = fix.f(x,fiy.g(x,y,z),z) 

y = W-g(^x.f(x, ny.g(x, y, z),z),y, z) 

Using this fact and Proposition [TJ we obtain our final result. 

Let the set of /LtwT s -expressions over the alphabet E be defined by the following 
grammar (with T being the initial nonterminal): 

T::=a\e \x\T + T \T-T \ fjs.T \ P u 
P ::=T xT \ P + P \ PP \ P* 

Here, a <E E and x G X for an infinite countable set of variables. An occurrence 
of a variable is free if it is not in the scope of a ^(-operation, and bound, if it is 
not free. A closed expression does not have free variable occurrences. The seman- 
tics of these expressions are defined as expected using the monotone functions 
over P(E^) and P(E^ x E^) introduced earlier. When the free variables of an 
expression form the set y, then an expression denotes a language in P((Euy)$). 

Remark 1. Actually, e is redundant, as it is expressible by ((fix.x x fix.x)*)". We 
do not need a constant denoting the empty set of pairs since it is expressible 
by (fxx.x) x (iix.x). 

Theorem 2. A language L C E* is an MCFL of scattered words if and only if 
it can be denoted by a closed fiujT s - expression. 

Proof. It is easy to show that each expression denotes an MCFL of scattered 
words. One uses the following facts, where A denotes an alphabet and x, # $ A. 

1. The set of MCFLs (of scattered words) over A is closed under + and •. 

2. If L,L' C A* are MCFLs (of scattered words), then L#L' C (A U {#}) J is 
an MCFL (of scattered words). 

3. Suppose that L,L' C A^A^ are MCFLs (of scattered words). Then 

{uv#v'u : u#v! E L, v#v' e L'} C A i #A i 
is an MCFL (of scattered words). 
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4. Suppose that L C A^ftA* is an MCFL (of scattered words). Then 

{«i . . . u n #v n . . . V! : n > 0, u»#^ £l}C Zi»#^ 

is an MCFL (of scattered words) . 

5. Suppose that L C A^ftA* is an MCFL (of scattered words). Then 

{{u x u2 . . .)(. . . v 2 vi) : Ui#Vi £ L} C zi B 

is an MCFL (of scattered words) . 

6. Suppose that L C (Ali {#})" is an MCFL (of scattered words). Then, with 
respect to set inclusion, there is a least language L' C Z\" such that L[x M> 
L'] = I/', and this language L' is an MCFL (of scattered words). (Here, 
L[x i— ► i'] is the language obtained from L by 'substituting' L' for x.) 

It is known (see [14]) that the class of MCFLs is (effectively) closed under substi- 
tution and that every context-free language of finite words (in particular, {a, b}, 
{ab} or {a#6}) is an MCFL, showing Items 1-3 above. 

For Items 4 and 5, let G = (V, A U {#}, R, S, T) be an MCFG generating the 
MCFL L C A*#AK Then 

Gi = (V U {#}, 4 U {#'}, B U {# -> #', # -> 5}, #, T) 

generates the MCFL L\ = {ui . . . u n #'v n . . . Vi : n > 0, u^vi £ L}, showing 
Item 4 (applying the substitution #' i-> {#}) and 

G 2 - (V^U {#}, Ai?U {# -> 5},#,^U{ff U{#} : H C y}) 

generates the MCFL defined in Item 5. 

Finally, let G = (V, A U {x}, i?, 5, J 7 ) be an MCFG generating LC(4u {x}) 8 . 
Then 

G 3 = (V U {a;}, Zi, i? U {x -> S}, x, J") 
generates the language L' of Item 6. 
The other direction follows from Proposition [TJ □ 

Example 5. The expression [±x.((xxx) u '+a+b+e) denotes the set of all scattered 
words over the alphabet {a,b}. 

Example 6. Let L C {a, 6}" be the language of all words w such that the word 
obtained from w by removing all occurrences of letter b is well-ordered, as is 
the 'mirror image' of the word obtained by removing all occurrences of letter 
a. It is not difficult to show that each word in L contains only a finite number 
of 'alternations' between a and b. Using this fact, an MCFG generating L is: 
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G = ({S, A, B, I, J}, S, R, S, {{I}, {J}}) with R consisting of the productions 

S -> AS | BS | e 
A-¥a\e\I 
I ->■ AT 
B^& | e | J 
J^ JB 

Using the algorithm described above (with some simplification), an expression 
for L is: 

ts = ^x s .((t A +t B )x s +e) 
with 

t A = fix a ■ (a + e + (xa x e) u ) 
t B = fJ,x B -(b + e + (e x x B ) u )- 

We restate Theorem Q] and show that it is a corollary of Theorem [5J 

Theorem. A language L C Z 1 " is an MCFL of well-ordered words iff it is denoted 
by some closed (UwTuj-expression. 

Proof. Recall that the set of fiu>T w -expressions over an alphabet £ is defined by 
the grammar 

T ::=a\e \ x\T + T \T-T \ \xx.T \ T u 

where a s £ and x ranges over the set X of variables, moreover, an expression t 
is closed if each occurrence of a variable x in t is within the scope of some prefix 
/ix. Below we will sometimes view the construct t u as a shorthand for (t x e) u . 

For one direction, we show by structural induction that for a fiu>T w -expression 
t with free variables in X, the language |t| C (£ U Xy denoted by t consists 
of well-ordered words. For the base cases, i.e. when t = a, t = e or t = x, the 
claim clearly holds. If t = t\ + 1% or t = t\ ■ t-z, or t = t", for some expressions 
t\,t2, our claim is again clear (using the fact that every well-ordered product of 
well-ordered words is well-ordered in the last two cases). Finally, if t = fxx.ti, 
where t\ denotes an MCFL L C (£ U X)% \t\ is the language (J L a , where 

a>0 

L = and for each a > 0, 

where L <Q , = I (J L^ . Thus, if L contains only well-ordered words then so 

\/3<a J 

does each L a , since languages of well-ordered words are closed under substitu- 
tion. 
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For the other direction, we may restrict ourselves to expressions (of type T 
or P) which do not have any subexpression denoting the empty set, nor any 
subexpression other than e denoting {e}. 

Suppose that t and p axe such expressions of type T and P, respectively. It is 
not difficult to prove the following claim by (simultaneous) structural induction: 

Claim A. If £ has a subexpression (belonging to the syntactic category P) of the 
form t\ x ti with ti ^ e, then |£| contains a word which is not well-ordered. If p 
has a subexpression p' — t\ x t% with £2 ^ e, then \p\ contains a pair (u, v) such 
that either v ^ e or one of u, v is not well-ordered. 

To prove this, first note that £ cannot have the form a, e or x. When p = ti x £2, 
for some ti,t2, £2 7^ £, then our claim clearly holds for p, since either one of £1 
and £2 contains a word which is not well-ordered, or fal contains a nonempty 
word. The induction step is clear when p = Pi + pi , p = P\ • Pi, p = p\ , or when 
£ = t\ + £2, t = t\ ■ £2, or t = p". When £ = fix.ti, then t\ contains a word u 
which is not well-ordered. Since by assumption |£| contains a nonempty word v, 
t contains m[i 14 d], which is not well-ordered. 

To complete the proof, note that if each subexpression of £ of the form t\ x £2 
satisfies £2=2, then we can transform £ into an equivalent /iluT w expression by 
repeatedly replacing subexpressions of the form t\ x e with t\ and subexpressions 
of the form t\ with fj,x.(tix + e). □ 



Using Claim A, we may develop a low-degree polynomial-time algorithm for 
the following decision problem: given a closed /xwT s -expression £ of syntactic 
category T, does the language denoted by £ consist of well-ordered words only? 
The expression £ may be assumed to be given as an expression tree. 

In the following, £i,£2 denote expressions belonging to the syntactic category T 
and p\,P2 denote expressions of syntactic category P. Expressions e,ei,e2 are 
arbitrary. We also allow the symbol to appear in expressions, which denotes 
the empty language. 

In the first step of the algorithm, we transform £ into an equivalent expression £$ 
which is either the symbol 0, or contains no subexpression denoting the empty 
set. This can be done by a straightforward algorithm in linear time using the 
fact that an expression of the form \xx.t\ denotes the empty language iff £i[x/0], 
the expression obtained from t\ by replacing each free occurrence of x in t\ by 
denotes the empty language. 

Suppose now that £0 is not the symbol 0, so that £0 is not empty. We construct 
another equivalent expression in which each subexpression of syntactic category 
T denoting {e} is e itself. To achieve this, we determine for each subexpression 
e of £0 the set SYMBOLS(e) C E U X containing all the symbols that occur in 
some word of |e| (or in a word in a pair of |e|, if e is of type P). The recursion 
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rules for this are: 

SYMBOLS(e) = 0, SyMBOLS(:e) = {x}, SYMBOLS(a) = {a}, 
SYMBOLS(eie 2 ) = SYMBOLS(ei + e 2 ) = SYMBOLS(ei) U SYMBOLS(e 2 ), 

Symbols^) = Symbols(p") = SYMBOLS(pi), 

SYMBOLS(ti X t 2 ) = SYMBOLS(ii) U SYMBOLS(t 2 ), 
SYMBOLS(^X.ii) = SYMBOLS(ii) - {x}. 

Note that the correctness of these rules (e.g. the one for concatenation) depends 
on the assumption that no subexpression of t$ denotes the empty set. 

Having computed SYMBOLS(e) for each subexpression e, observe that |e| = {e} 
for a subexpression e of syntactic category T if and only if SYMBOLS(e) = 0. 
Hence, during the computation of Symbols(.), we can flag each subexpression of 
£0 of type T by a bit indicating whether it denotes the language {e}. Using this 
information, we can then replace each maximal subexpression denoting {e} by 
s, yielding an equivalent expression t$ e containing no occurrence of the symbol 
such that each subexpression of type T different from s denotes a language 
containing at least one nonempty word. Applying now Claim A to t® s , we get 
the desired decision procedure answering the question whether the given closed 
expression t denotes a language of well-ordered words. 

All steps can be performed in (deterministic) linear time in the usual RAM model 
of computation, say, except for the computation of the function Symbols(.) 
whose time complexity depends on the data structure chosen for representing sets 
of symbols. If this data structure is a self-balancing binary tree, which supports 
the construction of and the singleton sets in constant time, the removal of 
one element from an n-element set in O(logn) time and the construction of 
the union of two sets with n and k elements in C(min{n, k} ■ log(n + k)) time 
(destroying the two sets, which is not a problem since only their emptiness flag is 
needed later, which is already stored), respectively, then we get an overall time 
complexity of 0(n ■ log n). Thus we have shown the following: 

Corollary 1. The 'problem whether an arbitrary closed jjlloT s - expression of syn- 
tactic category T denotes a language which consists of well-ordered words only, 
can be decided in 0{n ■ log n) time (in the usual RAM model of computation) . 
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